Multidimensional attributes expose Heider balance dynamics to measurements

Most of studied social interactions arise from dyadic relations. An exception is Heider Balance Theory that postulates the existence of triad dynamics, which however has been elusive to observe. Here, we discover a sufficient condition for the Heider dynamics observability: assigning the edge signs according to multiple opinions of connected agents. Using longitudinal records of university student mutual contacts and opinions, we create a coevolving network on which we introduce models of student interactions. These models account for: multiple topics of individual student opinions, influence of such opinions on dyadic relations, and influence of triadic relations on opinions. We show that the triadic influence is empirically measurable for static and dynamic observables when signs of edges are defined by multidimensional differences between opinions on all topics. Yet, when these signs are defined by a difference between opinions on each topic separately, the triadic interactions’ influence is indistinguishable from noise.

The panels show the density of pairs of students' distances.but now, it is computed as the average value of the sum of absolute values of differences for all attributes and normalized by the new maximum distance (C). xi where ∆i, j t is an absolute value of differences between t-th principal component of agents i and j, and C = max i, j ∑ t ∆i, j t is the normalization constant such that the considered tolerances Θ vary from 0 to 1.The distance distribution after this transformation is presented in Fig. S2.Further results for multi-edges without correlations are shown in the section S5.

S3 Generated signed networks-overview and statistics
Signed networks generated for multi-edges are dependent on the tolerance Θ. Fig. S3 presents the network for the first term for three different tolerance values.As expected, for Θ = 0, there are only two positive links and most triads are unbalanced.The largest component starts having both balanced and unbalanced triads from Θ = 0.1875.If Θ ≥ 0.75, there is at most one negative link in the network.
The numbers of triads in each term are listed in Fig. 3  Considering all the triads from all the terms, the exact numbers of balanced and unbalanced triads p D (Θ) are shown in Fig. S4 and Fig. S5.Fig. S4a shows the results in the case of simple triads for different topics, whereas Fig. S5a presents the change of numbers of triads depending on the tolerance value.B-panels of those figures also show histograms of specific triad types.
In the case of simple triads, most triads are balanced.This is not surprising.Unbalanced triads require three nodes to have −1, 0 and +1 opinions, which did not often happen.This is also why simple triads are insufficient to observe triad dynamics.Fig. S4b, as defined, does not show triads with one negative edge.Most of the balanced triads are triads with two negative edges.
Fig. 5b in the main paper shows, among others, triad transition probabilities T assuming the triad exists in two consecutive terms.Here, apart from considering balanced and unbalanced triads, Fig. S6 includes a possibility that a triad will vanish in the next term.Transition probabilities among three states (balance, unbalanced, absence) were denoted as T 3 .Triad disappearance is the main factor of triad changes for all tolerance values.For most Θ values, unbalanced triads are more likely to dissolve.

S4 Statistical testing of comparisons between data and random models
As mentioned in the main text, for models A 8 and C 8 , the consecutive results in terms of Θ of densities of triads are correlated.The reason is as follows.In those models, students' attributes are shuffled either without (A 8 ) or with (C 8 ) keeping the vectors of attributes intact.Then, for each shuffling, the density of balanced triads as a function of tolerance is obtained.Therefore, the density for the specific Θ, p M (Θ), will be similar to p M (Θ ± ∆Θ).Hence, the tests for specific Θ's are dependent on each other, prohibiting the use of family-wise errors.Therefore, the goal was to create an aggregated test.Our reasoning is as follows.After each model realization, we obtain a curve p M (Θ).We know the values of those curves only for the considered values of Θ (16 values in total).Hence, we need to compare the obtained values for each Θ separately.Thus, for each considered Θ, we ranked the obtained densities p M (Θ).Then, the statistics for the randomized data set become the sum of ranks for different tolerances.Fig. S7 shows the histogram of the statistics for the model C 8 .It is a bell-shaped distribution, and the Shapiro-Wilk test could not reject the hypothesis of the data normality (a p-value of 0.10 was obtained).However, as this type of test is non-parametric, we used the corresponding way to calculate the p-value in the test whether the model can describe the real data.We did not use methods assuming normality.
Thus, p-values of the tests whether the random model describes the data set well are defined by the probability that the random model yields a higher or equal sum of ranks for the plot of real data.For A 8 and C 8 , we obtained p-values of 0.038 and 0.005, which allowed us to reject the corresponding null hypotheses.
A different case than presented above is for the model E 8 .Here, for each tolerance, the density of positive links is obtained and this density is further used to generate new signs.As a result, for each tolerance, all the created networks are not directly related to each other and the family-wise error rate can be controlled.

S5 Static properties of triads for multi-edges without correlations
Having defined multi-edges with correlations removed (see section S2), we have performed similar comparisons to random models (A 8 , E 8 and C 8 ) as presented in Fig. 4 in the main text.The results are presented in Fig. S8.The form of plots is different than in the main text because for the transformed variables, the number of considered tolerance values is much larger.
The comparison between the data and models A 8 and C 8 tells us that the curve for the real data is significantly larger with p-values of < 0.001 and 0.011, respectively.The specific identified ranges of tolerance values, which the model does not explain, are as follows: Therefore, when using multi-edges without correlations, we obtain larger tolerance ranges of the cases of the random models being unable to explain the observed densities of balanced triads.However, the main difference between the obtained results for simple Manhattan-defined edges and multi-edges without correlations is the meaning of tolerance Θ.The results for both definitions of signed links are similar; thus, we preferred to show the results of the former in the main text because of the simplicity and the more natural tolerance interpretation.

S6 Comparing data and models with a small number of opinions
We tested models A, E and C in the case of varying numbers of opinions.The procedure steps were as follows: 1. Choose the number of attributes n.
2. From all attributes, choose a combination of n attributes.

Calculate true densities of balanced triads p D (Θ; n).
4. Generate M random networks for models A n , E n , C n and for each realization calculate densities of balanced triads p M (Θ; n).M was chosen to be equal to 100.
In the rest of this section, we present the following results.First, we show the results of p D (Θ; n) and p M (Θ; n) for model A n (Fig. S9).Similar plots for other models are analogous; therefore, they are not shown.Observations of curves p D (Θ; n) and p M (Θ; n) do not allow comparison of real and random networks.Similarly, as in the main paper, the difference (p D − p M ) is plotted (Fig. S10).To make the plots readable, we omitted error bars.As a consequence, one cannot assess if the differences are statistically significant.First-level tests for models A n and C n are presented in the main paper.Here, we present the 2nd level tests for models A n and C n and the tests for model E n (Fig. S11-S13).
Fig. S9 shows that the tendencies of changes of densities of balanced triads are similar for the real and randomized networks.For small tolerance values, considering more and more topics, the density of balance triads becomes smaller.The opposite tendency is observed when Θ > 0.6.These phenomena are caused by the fact that with low (high) tolerance values and the increasing number of attributes, negative (positive) links become more likely.Consequently, with increasing n, there appear to be more unbalanced (balanced) triads with three negative (positive) links.Fig. S9 presents the results only for model A n .Plots for models E n and C n , not shown here, are analogous.becomes negative (though not significantly) at Θ ≈ 0.7.Increasing values of the difference for small tolerances indicate that multidimensionality is essential to conclude that the model does not describe the real network well.
For model E n , with the increase of n, the differences converge to the same shape.This shows that multidimensionality is not important to notice whether the model describes well the real network.Fig. S10b does not present results for E 1 because the model E 1 used in the main paper is slightly different from other E n models.With a single attribute, only one type of unbalanced triads can be observed.In the main paper, model E 1 is constructed in such a way that this is preserved.When n > 1, all types of triads are possible.Therefore, here we present the results for multidimensional attributes only.
Fig. S10c presents the results of differences for model C n .With increasing n, two peaks appear: at Θ ≈ 0.2 and at Θ ≈ 0.75.Again, multidimensionality is essential to differentiate the real network from the randomized ones.
Fig. S11 and S13 show the results of the second-level statistical analysis for specific tolerance values for models A n and C n .For these datasets, one-dimensional attributes do not allow the measurement of Heider balance interactions.Having two opinions, there are such attribute combinations that make this measurement possible.With more attributes, it becomes more evident for which tolerances Heider balance interactions can be observed and for which the real network does not differ from random ones.Fig. S12 shows p-values of the hypothesis whether the observed densities p D (Θ; n) are similar to p M (Θ; n).Similarly, as for Fig. S10b, we conclude that multidimensionality does not play an important role for model E n .

S7 Analyzing dynamic properties of data
The following subsections describe the details of the proposed agent-based model, the proposed error functions used in the fitting ABM procedure and to compare results between different models.The code is available here: https://github.com/pjgorski/NetHeider.

S7.1 Agent-based model details
The steps of the proposed agent-based model were described in the main paper.We examined a few model variants, including rewiring of edges (instead of removing and adding new ones) and changing the order of decisions.We presented the model yielding the best agreement with the data.Using the real data, we obtained the average probability of changing an unbalanced triad with one negative edge into a triad with three positive links (i.e., p n ).Other parameters were difficult to calibrate because of many links vanishing in the real data, which could be caused by other factors (as stated in the main paper) unrelated to the analyzed model and dynamics.Therefore, the probability of choosing to remove a link p r and the probability of adding a new connection p add were fitted to best match the real data measures.The following values were evaluated: p r ∈ {0, 0.05, 0.1, . . ., 0.3, 0.4, . . ., 1} and p add ∈ {0.01, 0.03, . . ., 0.09}.Another fitted parameter is the number of steps the ABM dynamics should run because it is unclear what this value should be.Therefore, during each simulation, we saved the network state every 5 steps up to 400 steps and later used these timestamps in the fitting procedure.

Figure S1 .
Figure S1.Comparison between Manhattan distance distribution among connected (a) and all pairs of agents (b).The panels show the density of pairs of students' distances.

Figure S2 .
Figure S2.Distribution of Manhattan distances after transforming opinions using PCA.

Figure S3 .
Figure S3.Signed networks with signs generated from multidimensional opinions based on tolerance Θ.The panels show the network from the first term.Only nodes that are part of triads are shown.The network contains six components consisting of 24, 6, 4, 3, 3 and 3 nodes.Blue and red edges correspond to positive and negative relations, respectively.

Figure S4 .Figure S5 .Figure S6 .
Figure S4.Triad histograms for networks with uni-dimensional edges.Panels show (a) exact numbers of balanced and unbalanced triads and (b) exact numbers of specific triad types in networks with signs obtained using a single topic.

FrequencyFigure S7 .
Figure S7.Histogram of values of statistics for the model C 8 in the first-level test.1000 model realizations were used to obtain this figure.The value of the statistic for the real data curve was 13994.5.

•Figure S8 .
Figure S8.Comparison between the density of balanced triads in the case of multi-edges with correlations removed and random models A 8 , E 8 and C 8 .The shaded areas show one standard deviation.

Figure S9 .
Figure S9.Model A n .Densities of balanced triads with varying numbers of attributes considered.Panels show the densities for (a) real data set, p D (Θ; n), and (b) for randomized ones, p D (Θ; n).The legend for both panels is given in the middle.The lines are a guide for the eye.Error bars are not shown to increase plots' readability.

Table S1 .
The coding system for the considered topics.

Table S2 .
The Spearman rank correlation coefficients for original opinions in the 1st term.

Table S3 .
in the main text.These numbers are listed explicitly in Tab.S3.This table also lists the number of edges, graph density, and the global and local clustering coefficients.Elementary statistics of communication networks from each term.Columns represent: N -number of students actively communicating in each term, E -number of created edges, d -graph connection density, T -transitivity, C 1 and C 2 -measures of average local clustering coefficient.Graph density d and clustering coefficient C 2 consider active students only.The parameter C 1 takes the average over all the students (i.e., 108) included in the analysis.